You are here:

Parquet

You can load the ETL into a Parquet file, and save the new file to a shared folder location. When the ETL is run, the data will be loaded into Parquet files in the given folder. Each table in the data flow is loaded into a separate Parquet file and the corresponding .crc file is generated for each of these. To use a Parquet file as a target, add the Parquet node from the Targets panel to the data flow.

Configure a Parquet Target

From the target's Properties panel, name the new database that will be created, and provide a pointer to a shared folder where the file will be located:

Database Name: name the new database that will be generated when the ETL is run.
Shared Folder Path: provide a pointer to a shared folder where the new database will be saved.
Create Folders: generate folders and save the database file within these folder:
- Database Name: create a folder named according to the given database name, and save the database file inside this folder.
- Date Time: create a folder named according to the date and time at which the ETL is run, and save the database file inside this folder. If a database folder is also created, the Date Time folder will be a subfolder.

Finally, click ‘Connect All’ to connect the target node to the data flow. As usual, you can add a description to the node's Properties panel.

Description

Expand the Description window to add a description or notes to the node. The description is visible only from the Properties panel of the node, and does not produce any outputs. This is a useful way to document the ETL pipeline for yourself and other users.

Run the ETL

As there is no database or in-memory destination, the Data Model and Security stages are not relevant. Skip these steps and simply run the ETL from the Data Flow.

Click here to learn how to process the ETL.

Feedback

Couldn't find what I was looking for

Help was confusing, unclear or incomplete

Instructions didn't work